AITopics | noise stability

Collaborating Authors

noise stability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Noise Stability of Transformer Models

Haris, Themistoklis, Zhang, Zihan, Yoshida, Yuichi

arXiv.org Machine LearningFeb-10-2026

Understanding simplicity biases in deep learning offers a promising path toward developing reliable AI. A common metric for this, inspired by Boolean function analysis, is average sensitivity, which captures a model's robustness to single-token perturbations. We argue that average sensitivity has two key limitations: it lacks a natural generalization to real-valued domains and fails to explain the "junta-like" input dependence we empirically observe in modern LLMs. To address these limitations, we propose noise stability as a more comprehensive simplicity metric. Noise stability expresses a model's robustness to correlated noise applied to all input coordinates simultaneously. We provide a theoretical analysis of noise stability for single-layer attention and ReLU MLP layers and tackle the multi-layer propagation problem with a covariance interval propagation approach. Building on this theory, we develop a practical noise stability regularization method. Experiments on algorithmic and next-token-prediction tasks show that our regularizer consistently catalyzes grokking and accelerates training by approximately 35% and 75% respectively. Simplicity Biases have been a promising direction of study in recent years (Shah et al., 2020; V a-sudeva et al., 2024; Bhattamishra et al., 2022) as they provide a unifying framework for generalization, interpretability and robustness. Neural networks, including Large Language Models (LLMs), often converge to the simplest possible functions that explain the training data.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2602.08287

Country:

North America > United States (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets

Neural Information Processing SystemsOct-2-2025, 15:51:24 GMT

Mode connectivity ( Garipov et al. , 2018 ; Draxler et al. , 2018 ) is a surprising In other words, the optima are not walled off in separate valleys as hitherto believed. Mode connectivity begs for theoretical explanation.

artificial intelligence, machine learning, neural network, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

46a4378f835dc8040c8057beb6a2da52-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 15:51:11 GMT

artificial intelligence, construction, interlayer smoothness, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

Noise Sensitivity and Stability of Deep Neural Networks for Binary Classification

Jonasson, Johan, Steif, Jeffrey E., Zetterqvist, Olof

arXiv.org Artificial IntelligenceAug-18-2023

The driving question of this paper is how robust a typical binary neural net classifier is to input noise, i.e. for a typical neural net classifier and a typical input, will tiny changes to that input make the classifier change its mind? When asking this, we take inspiration from phenomena observed for deep neural networks (DNN) used in practice and use that inspiration to give mathematically rigorous answers for some simple DNN models under one (of several possible) reasonable interpretations of the question. It is not a prerequisite for the reader to be familiar with DNNs to find the topic interesting and any Machine Learning lingo will be explained shortly. DNNs have shown results that range from good to staggering in many different data-driven areas, e.g. for prediction and classification. One of many reasons for this is that with sufficiently large models, neural networks can approximate any function [5].

artificial intelligence, boolean function, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2308.09374

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding Influence Functions and Datamodels via Harmonic Analysis

Saunshi, Nikunj, Gupta, Arushi, Braverman, Mark, Arora, Sanjeev

arXiv.org Artificial IntelligenceOct-3-2022

It is often of great interest to quantify how the presence or absence of a particular training data point affects the trained model's performance on test data points. Influence functions is a classical idea for this [Jaeckel, 1972, Hampel, 1974, Cook, 1977] that has recently been adapted to modern deep models and large datasets Koh and Liang [2017]. Influence functions have been applied to explain predictions and produce confidence intervals [Schulam and Saria, 2019], investigate model bias [Brunet et al., 2019, Wang et al., 2019], estimate Shapley values [Jia et al., 2019, Ghorbani and Zou, 2019], improve human trust [Zhou et al., 2019], and craft data poisoning attacks [Koh et al., 2019]. Influence actually has different formalizations. The classic calculus-based estimate (henceforth referred to as continuous influence) involves conceptualizing training loss as a weighted sum over training datapoints, where the weighting of a particular datapoint z can be varied infinitesimally.

artificial intelligence, datamodel, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.01072

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Using noise resilience for ranking generalization of deep neural networks

Morwani, Depen, Vashisht, Rahul, Ramaswamy, Harish G.

arXiv.org Machine LearningDec-16-2020

Recent papers have shown that sufficiently overparameterized neural networks can perfectly fit even random labels. Thus, it is crucial to understand the underlying reason behind the generalization performance of a network on real-world data. In this work, we propose several measures to predict the generalization error of a network given the training data and its parameters. Using one of these measures, based on noise resilience of the network, we secured 5th position in the predicting generalization in deep learning (PGDL) competition at NeurIPS 2020.

dataset, generalization, neural network, (15 more...)

arXiv.org Machine Learning

2012.08854

Country:

Asia > India > Tamil Nadu > Chennai (0.05)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > France (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets

Kuditipudi, Rohith, Wang, Xiang, Lee, Holden, Zhang, Yi, Li, Zhiyuan, Hu, Wei, Arora, Sanjeev, Ge, Rong

arXiv.org Machine LearningJun-14-2019

Efforts to understand how and why deep learning works have led to a focus on the optimization landscape of training loss. Since optimization to near-zero training loss occurs for many choices of random initialization, it is clear that the landscape contains many global optima (or near-optima). However, the loss can become quite high when interpolating between found optima, suggesting that these optima occur at the bottom of "valleys" surrounded on all sides by high walls. Therefore the phenomenon of mode connectivity (Garipov et al., 2018; Draxler et al., 2018) came as a surprise: optima (at least the ones discovered by gradient-based optimization) are connected by simple paths in the parameter space, on which the loss function is almost constant. In other words, the optima are not walled off in separate valleys as hitherto believed.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1906.06247

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Identity Connections in Residual Nets Improve Noise Stability

Yu, Shuzhi, Tomasi, Carlo

arXiv.org Machine LearningMay-26-2019

Residual Neural Networks (ResNets) achieve state-of-the-art performance in many computer vision problems. Compared to plain networks without residual connections (PlnNets), ResNets train faster, generalize better, and suffer less from the so-called degradation problem. We introduce simplified (but still nonlinear) versions of ResNets and PlnNets for which these discrepancies still hold, although to a lesser degree. We establish a 1-1 mapping between simplified ResNets and simplified PlnNets, and show that they are exactly equivalent to each other in expressive power for the same computational complexity. We conjecture that ResNets generalize better because they have better noise stability, and empirically support it for both simplified and fully-fledged networks.

artificial intelligence, machine learning, residual network, (16 more...)

arXiv.org Machine Learning

1905.10944

Country: